A key need articulated by Lake Tahoe managers is forecasting and understanding impacts of El Nino oscillations on the lake water quality. There are multiple plausible pathways for ENSO to impact water quality. Our goal with causal analysis is to assess not just the net effect but evidence for the importance/balance of individual pathways.

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.6.2
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   2.0.2     ✓ forcats 0.5.1
## Warning: package 'ggplot2' was built under R version 3.6.2
## Warning: package 'tibble' was built under R version 3.6.2
## Warning: package 'tidyr' was built under R version 3.6.2
## Warning: package 'readr' was built under R version 3.6.2
## Warning: package 'purrr' was built under R version 3.6.2
## Warning: package 'dplyr' was built under R version 3.6.2
## Warning: package 'forcats' was built under R version 3.6.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(rEDM)
## Warning: package 'rEDM' was built under R version 3.6.2

Data Setup

We can begin with the biogeochemical (BGC) variables analyzed in “_1_bgc_univariate.Rmd”, but need to append additional columns for key physical drivers.

load("./DATA/PROCESSED/_0 2-month BGC.Rdata")
# load("./DATA/PROCESSED/_0 3-month BGC.Rdata")

df_BGC <- df_2mo_BGC_LTP[102:323,]
# gap fill with linear interpolant
df_BGC <- mutate_at(df_BGC,-1, ~ zoo::na.approx(., maxgap = 4))
T_annual <- 6

# df_BGC <- df_3mo_BGC_LTP[68:215,]
# T_season <- 4

var_BGC <- names(df_BGC)[-1] %>% as.list()

Lake temperature, Streamflow, ENSO

load("./DATA/PROCESSED/_0 2-month T_LTP.Rdata")
var_T_LTP <- names(df_2mo_T_LTP)[-1] 

load("./DATA/PROCESSED/_0 2-month USGS.Rdata") # df_2mo_USGS
var_USGS <- names(df_2mo_USGS)[-1] %>% as.list()

load("./DATA/PROCESSED/_0 2-month indices.Rdata") # df_2mo_indices
var_indices <- names(df_2mo_indices)[-1] %>% as.list()

load("./DATA/PROCESSED/_0 2-month weather.Rdata")
var_weather <- names(df_2mo_weather)[-1] %>% as.list()

var_stations <- c(var_T_LTP,var_USGS,var_weather)
df_CCM <- df_BGC %>%
  left_join(df_2mo_T_LTP) %>%
  left_join(df_2mo_indices) %>%
  left_join(df_2mo_USGS) %>%
  left_join(df_2mo_weather)
## Joining, by = "Date"
## Joining, by = "Date"
## Joining, by = "Date"
## Joining, by = "Date"
save(df_CCM,file="./DATA/PROCESSED/_2 CCM input.Rdata")

Linear Correlations

Among Stream Flows

The 2-month averaged flow between the three USGS stations (Upper Truckee, Ward Creek, and Blackwood Creek) are highly correlated with each other at 0-lag. This would suggest either aggregating or working with the single most trust-worthy measurement. The Upper Truckee, for example, has a number of missed measurements due to ice (~1460 days).

Stream Flow and Precip

L_ccf_flow_vs_weather <- map(var_weather,function(var_i){
  map(var_USGS,function(var_j){
    df <- df_CCM %>% select(var_i,var_j) %>% filter(complete.cases(.))
    g <- ccf(df[[1]],df[[2]],main=paste(var_i,"&",var_j))
    return(g)
  })
})
## Note: Using an external vector in selections is ambiguous.
## ℹ Use `all_of(var_i)` instead of `var_i` to silence this message.
## ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
## Note: Using an external vector in selections is ambiguous.
## ℹ Use `all_of(var_j)` instead of `var_j` to silence this message.
## ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.

As expected there is a high correlation (~ 0.67) between the 2-month average TROA precipitation record and stream flows with a lag of about 4 months.

All Station-;level and Indices

L_ccf_stations_vs_indices <- map(var_indices,function(var_i){
  map(var_stations,function(var_j){
    df <- df_CCM %>% select(var_i,var_j) %>% filter(complete.cases(.))
    g <- ccf(df[[1]],df[[2]],main=paste(var_i,"&",var_j))
    return(g)
  })
})

Correlations between stream-flow measurements and El Nino climate indices appear significant but low. There is a consistent optimal lag of about 9 time steps. We are looking at 2-month averages in the data, so this corresponds to about an 18 month lag. Additionally, the lagged correlations are stronger with the SST-based Nino_3.4 index than the pressure-based SOI.

Biogeochemistry and Physical Variables

Now we look at the linear correlations between these physical variables and the biogeochemistry variables we studied in “Phase 1”. Based on the above statistical results, we use a single stream flow and a single oscillation index.

var_phys <- list("Q_mean_Ward","NINO_34","precip")
var_phys <- c(var_phys,var_T_LTP)

L_ccf_BGC_vs_phys <- map(var_phys,function(var_i){
  map(var_BGC,function(var_j){
    df <- df_CCM %>% select(var_i,var_j) %>% filter(complete.cases(.))
    g <- ccf(df[[1]],df[[2]],main=paste(var_i,"&",var_j))
    return(g)
  })
})

We see much higher correlations between the stream flow than climate indices to all BGC variables. The strongest correlation is to surface NO3. These correlations exhibit a strong annual cycle to them, and (at least in some cases) may not be easily distinguished from the shared seasonal periodicity of the individual time series.

Other observations: - The Secchi depth itself has a strong 2 month lagged (negative). \ - The lagged correlations with precipitation seem to be similar across the board to stream flow but uniformly weaker. \

Lagged Cross-map

Methodological notes

There is a current debate among CCM practitions over what embedding dimension to use for prediction. In some of the initial CCM demonstrations (e.g. sardine-anchovy-SST) the time series were relatively short (< 100 points). Over-fitting is a real concern in those cases; significance can be estimated from null (surrogate) analyses, but with short time series too many tunable knobs in the analysis will produce large variances in null results and hence make significance hard to establish. This has led to a convention of using univariate embedding dimensions of the predictor variable to assess CCM to a target variable. However, the optimal embedding dimension for univariate forecasting may well underestimate embedding dimension for other cross-map skill. As a compromise between over-fitting and under-estimation, we use the convention of selecting embedding dimension based off cross-mapping at a different time lag.

L_xmap_save_params <- c("columns","target","E","Tp","knn","tau","theta")

f_lag_xmap <- function(df,target_col,pred_col,Tp_list=(-6:6)){
  
  #
  E_list <- 1:15
  # theta_list <- c(0,10^seq(-2,1,by=.075))
  
  lib <- paste(1,NROW(df))
  pred <- paste(1,NROW(df))
  
  
  out_lag_xmap <- map_df(Tp_list, function(Tp_i){
    
    ## Fit E
    
    stats_simplex <- map_df(E_list,function(E_i){
      
      out_simplex_i <- Simplex(dataFrame=df,
                               target=target_col,
                               columns=pred_col,
                               lib=lib,pred=pred,
                               E=E_i,
                               Tp = Tp_i - 1,
                               parameterList=TRUE)
      
      stats_i <- compute_stats(out_simplex_i$predictions$Predictions,
                               out_simplex_i$predictions$Observations)
      
      stats_i <- bind_cols(out_simplex_i$parameters[L_xmap_save_params],
                           stats_i
      )
      
      return(stats_i)
      
    }) # map_df(E_list)
    
    stats_simplex <- suppressMessages(type_convert(stats_simplex))
    E_star <- as.integer(stats_simplex$E[which.max(stats_simplex$rho)])
    
    ## Quantify x-map skill
    
    out_xmap <- Simplex(dataFrame=df,
                        target=target_col,
                        columns=pred_col,
                        lib=lib,pred=pred,
                        E=E_star,
                        Tp = Tp_i,
                        parameterList=TRUE)
    
    stats_xmap_Tp_i <- compute_stats(out_xmap$predictions$Predictions,
                                out_xmap$predictions$Observations)
    
    stats_xmap_Tp_i <- bind_cols(out_xmap$parameters[L_xmap_save_params],
                           stats_xmap_Tp_i)
    return(stats_xmap_Tp_i)
    
  }) #map_df(Tp_list)
  
  return(out_lag_xmap)
  
}
f_lag_xmap(df_CCM,target_col = "precip",pred_col = "Secchi_Ave")
##       columns target  E Tp knn tau theta num_pred       rho        mae
## 1  Secchi_Ave precip 12 -6  13  -1     0      217 0.5191865 0.05955539
## 2  Secchi_Ave precip 13 -5  14  -1     0      215 0.5448022 0.05977021
## 3  Secchi_Ave precip 12 -4  13  -1     0      215 0.5618297 0.05823576
## 4  Secchi_Ave precip 10 -3  11  -1     0      216 0.5490260 0.05925556
## 5  Secchi_Ave precip 15 -2  16  -1     0      210 0.4916814 0.06297266
## 6  Secchi_Ave precip 13 -1  14  -1     0      211 0.4419165 0.06449742
## 7  Secchi_Ave precip 15  0  16  -1     0      208 0.4402541 0.06382296
## 8  Secchi_Ave precip 14  1  15  -1     0      209 0.4273429 0.06299240
## 9  Secchi_Ave precip 15  2  16  -1     0      208 0.4143151 0.06282227
## 10 Secchi_Ave precip 12  3  13  -1     0      211 0.4244052 0.06207129
## 11 Secchi_Ave precip 11  4  12  -1     0      212 0.4116648 0.06346366
## 12 Secchi_Ave precip 12  5  13  -1     0      211 0.3508362 0.06525794
## 13 Secchi_Ave precip 11  6  12  -1     0      212 0.3770547 0.06333324
##          rmse      perc        p_val
## 1  0.08818729 0.9878049 1.000000e-10
## 2  0.08892173 0.9878049 1.000000e-10
## 3  0.08738646 0.9879227 1.000000e-10
## 4  0.08775060 0.9857143 1.000000e-10
## 5  0.09173132 0.9854369 1.000000e-10
## 6  0.09390089 0.9856459 1.000000e-10
## 7  0.09134672 0.9855769 1.000000e-10
## 8  0.09189877 0.9808612 1.000000e-10
## 9  0.09280694 0.9759615 1.382345e-10
## 10 0.09203167 0.9715640 1.000000e-10
## 11 0.09263620 0.9669811 1.253818e-10
## 12 0.09580317 0.9620853 6.310969e-08
## 13 0.09455826 0.9575472 4.907547e-09
f_lag_xmap(df_CCM,target_col = "Q_mean_Ward",pred_col = "Secchi_Ave")
##       columns      target  E Tp knn tau theta num_pred       rho      mae
## 1  Secchi_Ave Q_mean_Ward 11 -6  12  -1     0      218 0.6822087 17.51154
## 2  Secchi_Ave Q_mean_Ward 11 -5  12  -1     0      217 0.6285283 18.31740
## 3  Secchi_Ave Q_mean_Ward  9 -4  10  -1     0      218 0.6531580 18.86617
## 4  Secchi_Ave Q_mean_Ward 14 -3  15  -1     0      212 0.6664759 18.50504
## 5  Secchi_Ave Q_mean_Ward 13 -2  14  -1     0      212 0.6648389 18.11689
## 6  Secchi_Ave Q_mean_Ward 10 -1  11  -1     0      214 0.6498665 18.48178
## 7  Secchi_Ave Q_mean_Ward 15  0  16  -1     0      208 0.6229392 19.72848
## 8  Secchi_Ave Q_mean_Ward 15  1  16  -1     0      208 0.5383153 21.14634
## 9  Secchi_Ave Q_mean_Ward 15  2  16  -1     0      208 0.5162849 21.34564
## 10 Secchi_Ave Q_mean_Ward 14  3  15  -1     0      209 0.4999022 21.51464
## 11 Secchi_Ave Q_mean_Ward 13  4  14  -1     0      210 0.4929062 21.44508
## 12 Secchi_Ave Q_mean_Ward 14  5  15  -1     0      209 0.5002103 21.21706
## 13 Secchi_Ave Q_mean_Ward 11  6  12  -1     0      212 0.5458824 20.94162
##        rmse      perc p_val
## 1  26.08961 1.0000000 1e-10
## 2  27.69326 1.0000000 1e-10
## 3  27.49310 1.0000000 1e-10
## 4  27.33945 1.0000000 1e-10
## 5  27.22924 1.0000000 1e-10
## 6  27.43617 1.0000000 1e-10
## 7  28.48237 1.0000000 1e-10
## 8  30.61325 0.9951923 1e-10
## 9  30.98915 0.9903846 1e-10
## 10 31.33009 0.9856459 1e-10
## 11 31.43253 0.9809524 1e-10
## 12 31.40850 0.9760766 1e-10
## 13 30.28754 0.9716981 1e-10
f_lag_xmap(df_CCM,target_col = "NINO_34",pred_col = "Secchi_Ave")
##       columns  target  E Tp knn tau theta num_pred       rho       mae
## 1  Secchi_Ave NINO_34 14 -6  15  -1     0      215 0.3346590 0.6748254
## 2  Secchi_Ave NINO_34 14 -5  15  -1     0      214 0.3369313 0.6755234
## 3  Secchi_Ave NINO_34 14 -4  15  -1     0      213 0.3274300 0.6749642
## 4  Secchi_Ave NINO_34 15 -3  16  -1     0      211 0.3742967 0.6608231
## 5  Secchi_Ave NINO_34 15 -2  16  -1     0      210 0.3814923 0.6527511
## 6  Secchi_Ave NINO_34 15 -1  16  -1     0      209 0.4175828 0.6359161
## 7  Secchi_Ave NINO_34 14  0  15  -1     0      209 0.4494987 0.6173945
## 8  Secchi_Ave NINO_34 14  1  15  -1     0      209 0.4488991 0.6133756
## 9  Secchi_Ave NINO_34 14  2  15  -1     0      209 0.4370465 0.6173407
## 10 Secchi_Ave NINO_34 15  3  16  -1     0      208 0.4077534 0.6371257
## 11 Secchi_Ave NINO_34 13  4  14  -1     0      210 0.3748563 0.6536461
## 12 Secchi_Ave NINO_34 15  5  16  -1     0      208 0.3981884 0.6597667
## 13 Secchi_Ave NINO_34 15  6  16  -1     0      208 0.3686924 0.6669172
##         rmse      perc        p_val
## 1  0.8578092 0.5738916 2.011002e-07
## 2  0.8545152 0.6004902 1.760826e-07
## 3  0.8549941 0.6317073 4.190081e-07
## 4  0.8333184 0.6463415 6.981276e-09
## 5  0.8276634 0.6529126 3.713621e-09
## 6  0.8066020 0.6739130 1.000000e-10
## 7  0.7887423 0.6770335 1.000000e-10
## 8  0.7862277 0.6722488 1.000000e-10
## 9  0.7912415 0.6674641 1.000000e-10
## 10 0.8093983 0.6562500 2.852365e-10
## 11 0.8189583 0.6214286 7.158344e-09
## 12 0.8175553 0.6081731 7.954198e-10
## 13 0.8272214 0.5985577 1.515019e-08
f_lag_xmap(df_CCM,target_col = "Q_mean_Ward",pred_col = "precip")
##    columns      target  E Tp knn tau theta num_pred       rho      mae     rmse
## 1   precip Q_mean_Ward 12 -6  13  -1     0      217 0.8093696 12.73509 22.84754
## 2   precip Q_mean_Ward 12 -5  13  -1     0      216 0.8020437 12.88001 23.20990
## 3   precip Q_mean_Ward 10 -4  11  -1     0      217 0.7995108 13.11491 23.57576
## 4   precip Q_mean_Ward  9 -3  10  -1     0      217 0.8141769 12.24511 21.99073
## 5   precip Q_mean_Ward  7 -2   8  -1     0      218 0.8283444 11.56906 20.78865
## 6   precip Q_mean_Ward 10 -1  11  -1     0      214 0.8246270 12.16851 22.44980
## 7   precip Q_mean_Ward  9  0  10  -1     0      214 0.8281675 12.14727 22.33370
## 8   precip Q_mean_Ward  8  1   9  -1     0      215 0.8072014 12.93174 22.83239
## 9   precip Q_mean_Ward 10  2  11  -1     0      213 0.7618113 14.94177 24.00810
## 10  precip Q_mean_Ward  9  3  10  -1     0      214 0.6216174 18.10411 28.30925
## 11  precip Q_mean_Ward 12  4  13  -1     0      211 0.6003424 19.01690 29.04250
## 12  precip Q_mean_Ward 11  5  12  -1     0      212 0.5850046 19.02265 29.45511
## 13  precip Q_mean_Ward 12  6  13  -1     0      211 0.6430390 18.10792 27.81072
##         perc p_val
## 1  1.0000000 1e-10
## 2  1.0000000 1e-10
## 3  1.0000000 1e-10
## 4  1.0000000 1e-10
## 5  1.0000000 1e-10
## 6  1.0000000 1e-10
## 7  1.0000000 1e-10
## 8  0.9953488 1e-10
## 9  0.9906103 1e-10
## 10 0.9859813 1e-10
## 11 0.9810427 1e-10
## 12 0.9764151 1e-10
## 13 0.9715640 1e-10
f_lag_xmap(df_CCM,target_col = "precip",pred_col = "Q_mean_Ward")
##        columns target  E Tp knn tau theta num_pred       rho        mae
## 1  Q_mean_Ward precip 10 -6  11  -1     0      219 0.7472134 0.04273735
## 2  Q_mean_Ward precip 10 -5  11  -1     0      218 0.7348262 0.04441566
## 3  Q_mean_Ward precip  8 -4   9  -1     0      219 0.7549070 0.04257772
## 4  Q_mean_Ward precip  7 -3   8  -1     0      219 0.7495733 0.04228058
## 5  Q_mean_Ward precip  6 -2   7  -1     0      219 0.7415178 0.04420385
## 6  Q_mean_Ward precip  5 -1   6  -1     0      219 0.6055778 0.05009311
## 7  Q_mean_Ward precip  6  0   7  -1     0      217 0.5537960 0.05386840
## 8  Q_mean_Ward precip  7  1   8  -1     0      216 0.5237827 0.05645978
## 9  Q_mean_Ward precip 14  2  15  -1     0      209 0.5389511 0.05381174
## 10 Q_mean_Ward precip  6  3   7  -1     0      217 0.5215523 0.05813725
## 11 Q_mean_Ward precip  7  4   8  -1     0      216 0.5383265 0.05503933
## 12 Q_mean_Ward precip  6  5   7  -1     0      217 0.5089291 0.05761723
## 13 Q_mean_Ward precip  9  6  10  -1     0      214 0.5484630 0.05406989
##          rmse      perc p_val
## 1  0.06911764 0.9879227 1e-10
## 2  0.07249609 0.9879808 1e-10
## 3  0.06921670 0.9881517 1e-10
## 4  0.06952981 0.9859155 1e-10
## 5  0.07018741 0.9860465 1e-10
## 6  0.08312955 0.9861751 1e-10
## 7  0.08674681 0.9861751 1e-10
## 8  0.08943961 0.9814815 1e-10
## 9  0.08584271 0.9760766 1e-10
## 10 0.09012418 0.9723502 1e-10
## 11 0.08887309 0.9675926 1e-10
## 12 0.09196119 0.9631336 1e-10
## 13 0.08540527 0.9579439 1e-10
# var_phys <- list("Q_mean_Blackwood","NINO_34","precip")
df_lag_analys_BGC_vs_phys <- map_df(var_phys,function(var_i){
  map_df(var_BGC,function(var_j){
    
    df <- df_CCM %>% select(var_i,var_j) %>% filter(complete.cases(.))
    g <- ccf(df[[1]],df[[2]],main=paste(var_i,"&",var_j),plot=FALSE)
    
    df_ij <- f_lag_xmap(df_CCM,target_col = var_i,pred_col = var_j) %>%
      select(1:9) %>%
      top_n(1,rho) %>%
      rename(rho_ccm=rho) %>%
      mutate(rho_corr=max(abs(g$acf)),lag_corr=g$lag[which.max(abs(g$acf))])
    
    return(df_ij)
      
    
  })
})

save(df_lag_analys_BGC_vs_phys,file="./RESULTS/_2 lag analysis table.Rdata")